Bitexts as Semantic Mirrors
نویسندگان
چکیده
The importance of parallel corpora in machine translation research is widely recognized and undisputed. The amount of research on data-driven techniques in MT has grown tremendously since the introduction of the first automatic alignment techniques in the early 90’s that finally allowed to create large and reasonably clean bitexts from scratch without human interventions. Nowadays, it is impossible to think of an era before statistical machine translation and Google Translate is at everybody’s hands. However, there is more to it than just translation. Enthusiastic researchers in the early days came up with ever new ideas of using parallel data in problems of natural language understanding and processing. Some of these ideas almost disappear in the flood of SMT papers coming out every year. This paper tries to remind us of some other applications that illustrate the amazing utility of parallel corpora.
منابع مشابه
Conceptual Exploration of Semantic Mirrors
The “Semantic Mirrors Method” (Dyvik, 1998) is a means for automatic derivation of thesaurus entries from a word-aligned parallel corpus. The method is based on the construction of lattices of linguistic features. This paper models the Semantic Mirrors Method with Formal Concept Analysis. It is argued that the method becomes simpler to understand with the help of FCA. This paper then investigat...
متن کاملText Rewriting Improves Semantic Role Labeling (Extended Abstract)
Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to...
متن کاملText Rewriting Improves Semantic Role Labeling
Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to...
متن کاملLightly-Supervised Training for Hierarchical Phrase-Based Machine Translation
In this paper we apply lightly-supervised training to a hierarchical phrase-based statistical machine translation system. We employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora. We explore different ways of using this additional data to improve our system. Our results show that integrating a second translatio...
متن کاملAutomatic Construction of Chinese-English Translation Lexicons
The process of constructing translation lexicons from parallel texts (bitexts) can be broken down into three stages: mapping bitext correspondence, counting co-occurrences, and estimating a translation model. Stateof-the-art techniques for accomplishing each stage of the process had already been developed, but only for bitexts involving fairly similar languages. Correct and efficient implementa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013